NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

VL-TGS: Trajectory Generation and Selection Using Vision Language Models in Mapless Outdoor Environments

https://doi.org/10.1109/LRA.2025.3559822

Song, Daeun; Liang, Jing; Xiao, Xuesu; Manocha, Dinesh (April 2025, IEEE Robotics and Automation Letters)

We present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in human-centered environments. Such environments contain rich features like crosswalks, grass, and curbs, which are easily interpretable by humans, but not by mobile robots. We aim to compute suitable trajectories that (1) satisfy the environment-specific traversability constraints and (2) generate human-like paths while navigating on crosswalks, sidewalks, etc. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model enhanced with traversability constraints to generate multiple candidate trajectories for global navigation. We develop a visual prompting approach and leverage the Visual Language Model's (VLM) zero-shot ability of semantic understanding and logical reasoning to choose the best trajectory given the contextual information about the task. We evaluate our method in various outdoor scenes with wheeled robots and compare the performance with other global navigation algorithms. In practice, we observe an average improvement of 20.81% in satisfying traversability constraints and 28.51% in terms of human-like navigation in four different outdoor navigation scenarios.
more » « less
Free, publicly-accessible full text available April 10, 2026
DARC: Disturbance-Aware Redundant Control for Human–Robot Co-Transportation

https://doi.org/10.3390/electronics14122480

Mahmud, Al Jaber; Raj, Amir Hossain; Nguyen, Duc M; Xiao, Xuesu; Wang, Xuan (June 2025, Electronics)

This paper introduces Disturbance-Aware Redundant Control (DARC), a control framework addressing the challenge of human–robot co-transportation under disturbances. Our method integrates a disturbance-aware Model Predictive Control (MPC) framework with a proactive pose optimization mechanism. The robotic system, comprising a mobile base and a manipulator arm, compensates for uncertain human behaviors and internal actuation noise through a two-step iterative process. At each planning horizon, a candidate set of feasible joint configurations is generated using a Conditional Variational Autoencoder (CVAE). From this set, one configuration is selected by minimizing an estimated control cost computed via a disturbance-aware Discrete Algebraic Riccati Equation (DARE), which also provides the optimal control inputs for both the mobile base and the manipulator arm. We derive the disturbance-aware DARE and validate DARC with simulated experiments with a Fetch robot. Evaluations across various trajectories and disturbance levels demonstrate that our proposed DARC framework outperforms baseline algorithms that lack disturbance modeling, pose optimization, or both.
more » « less
Free, publicly-accessible full text available June 1, 2026
Multi-Strategy Deployment-Time Learning and Adaptation for Navigation under Uncertainty

Paudel, Abhishek; Xiao, Xuesu; Stein, Gregory J (November 2024, Conference on Robot Learning (CoRL))

Full Text Available
VLM-Social-Nav: Socially Aware Robot Navigation Through Scoring Using Vision-Language Models

https://doi.org/10.1109/LRA.2024.3511409

Song, Daeun; Liang, Jing; Payandeh, Amirreza; Raj, Amir Hossain; Xiao, Xuesu; Manocha, Dinesh (January 2025, IEEE Robotics and Automation Letters)

We propose VLM-Social-Nav, a novel Vision-Language Model (VLM) based navigation approach to compute a robot's motion in human-centered environments. Our goal is to make real-time decisions on robot actions that are socially compliant with human expectations. We utilize a perception model to detect important social entities and prompt a VLM to generate guidance for socially compliant robot behavior. VLM-Social-Nav uses a VLM-based scoring module that computes a cost term that ensures socially appropriate and effective robot actions generated by the underlying planner. Our overall approach reduces reliance on large training datasets and enhances adaptability in decision-making. In practice, it results in improved socially compliant navigation in human-shared environments. We demonstrate and evaluate our system in four different real-world social navigation scenarios with a Turtlebot robot. We observe at least 27.38% improvement in the average success rate and 19.05% improvement in the average collision rate in the four social navigation scenarios. Our user study score shows that VLM-Social-Nav generates the most socially compliant navigation behavior.
more » « less
Full Text Available
VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-Training

https://doi.org/10.1109/IROS58592.2024.10802451

Nazeri, Mohammad; Wang, Junzhe; Payandeh, Amirreza; Xiao, Xuesu (October 2024, IEEE)

Humans excel at efficiently navigating through crowds without collision by focusing on specific visual regions relevant to navigation. However, most robotic visual navigation methods rely on deep learning models pre-trained on vision tasks, which prioritize salient objects—not necessarily relevant to navigation and potentially misleading. Alternative approaches train specialized navigation models from scratch, requiring significant computation. On the other hand, self-supervised learning has revolutionized computer vision and natural language processing, but its application to robotic navigation remains underexplored due to the difficulty of defining effective self-supervision signals. Motivated by these observations, in this work, we propose a Self-Supervised Vision-Action Model for Visual Navigation Pre-Training (VANP). Instead of detecting salient objects that are beneficial for tasks such as classification or detection, VANP learns to focus only on specific visual regions that are relevant to the navigation task. To achieve this, VANP uses a history of visual observations, future actions, and a goal image for self-supervision, and embeds them using two small Transformer Encoders. Then, VANP maximizes the information between the embeddings by using a mutual information maximization objective function. We demonstrate that most VANP-extracted features match with human navigation intuition. VANP achieves comparable performance as models learned end-to-end with half the training time and models trained on a large-scale, fully supervised dataset, i.e., ImageNet, with only 0.08% data.
more » « less
Full Text Available
DTG : Diffusion-based Trajectory Generation for Mapless Global Navigation

https://doi.org/10.1109/IROS58592.2024.10802055

Liang, Jing; Payandeh, Amirreza; Song, Daeun; Xiao, Xuesu; Manocha, Dinesh (October 2024, IEEE)

We present a novel end-to-end diffusion-based trajectory generation method, DTG, for mapless global navigation in challenging outdoor scenarios with occlusions and unstructured off-road features like grass, buildings, bushes, etc. Given a distant goal, our approach computes a trajectory that satisfies the following goals: (1) minimize the travel distance to the goal; (2) maximize the traversability by choosing paths that do not lie in undesirable areas. Specifically, we present a novel Conditional RNN(CRNN) for diffusion models to efficiently generate trajectories. Furthermore, we propose an adaptive training method that ensures that the diffusion model generates more traversable trajectories. We evaluate our methods in various outdoor scenes and compare the performance with other global navigation algorithms on a Husky robot. In practice, we observe at least a 15% improvement in traveling distance and around a 7% improvement in traversability. Video and Code: https://github.com/jingGM/DTG.git.
more » « less
Full Text Available
Team Coordination on Graphs: Problem, Analysis, and Algorithms

https://doi.org/10.1109/IROS58592.2024.10802095

Zhou, Yanlin; Limbu, Manshi; Stein, Gregory J; Wang, Xuan; Shishika, Daigo; Xiao, Xuesu (October 2024, IEEE)

Full Text Available
Bi-CL: A Reinforcement Learning Framework for Robots Coordination Through Bi-level Optimization

https://doi.org/10.1109/IROS58592.2024.10801728

Hu, Zechen; Shishika, Daigo; Xiao, Xuesu; Wang, Xuan (October 2024, IEEE)

Full Text Available
Conflict Avoidance in Social Navigation—a Survey

https://doi.org/10.1145/3647983

Mirsky, Reuth; Xiao, Xuesu; Hart, Justin; Stone, Peter (March 2024, ACM Transactions on Human-Robot Interaction)

A major goal in robotics is to enable intelligent mobile robots to operate smoothly in shared human-robot environments. One of the most fundamental capabilities in service of this goal is competent navigation in this “social” context. As a result, there has been a recent surge of research on social navigation; and especially as it relates to the handling of conflicts between agents during social navigation. These developments introduce a variety of models and algorithms, however as this research area is inherently interdisciplinary, many of the relevant papers are not comparable and there is no shared standard vocabulary. This survey aims at bridging this gap by introducing such a common language, using it to survey existing work, and highlighting open problems. It starts by defining the boundaries of this survey to a limited, yet highly common type of social navigation—conflict avoidance. Within this proposed scope, this survey introduces a detailed taxonomy of the conflict avoidance components. This survey then maps existing work into this taxonomy, while discussing papers using its framing. Finally, this article proposes some future research directions and open problems that are currently on the frontier of social navigation to aid ongoing and future research.
more » « less
Full Text Available
Human Uncertainty-Aware MPC for Enhanced Human-Robot Collaborative Manipulation

https://doi.org/10.1109/ICPS59941.2024.10640020

Mahmud, Al Jaber; Nguyen, Duc M; Veiga, Filipe; Xiao, Xuesu; Wang, Xuan (May 2024, IEEE)

This paper presents the development of a novel control algorithm designed for tasks involving human-robot collaboration. By using an 8-DOF robotic arm, our approach aims to counteract human-induced uncertainties added to the robot's nominal trajectory. To address this challenge, we incorporate a variable within the regular Model Predictive Control (MPC) framework to account for human uncertainties, which are modeled as following a normal distribution with a non-zero mean and variance. Our solution involves formulating and solving an uncertainty-aware Discrete Algebraic Ricatti Equation (ua-DARE), which yields the optimal control law for all joints to mitigate the impact of these uncertainties. We validate our methodology through theoretical analysis, demonstrating the effectiveness of the ua-DARE in providing an optimal control strategy. Our approach is further validated through simulation experiments using a Fetch robot model, where the results highlight a significant improvement in performance over a baseline algorithm that does not consider human uncertainty while solving for optimal control law.
more » « less
Full Text Available

« Prev Next »

Search for: All records